library(visNetwork)
library(tidyverse)
library(igraph)
library(plotly)
library(seriation)
library(tourr)
The Bombing group and the Non Bombing groups are seperated by colors Red and Blue respectively. There are more people in the non bombing group. The clusters found in each of the groups are as follows -
In the bombing group the two main clusters found are originated at “Jamal Zougam” and “Mohamed Chaoui”.There are also some other smaller clusters .
In the Non bombing group the main cluster found was originated at “Imad Eddin Barakat”. There are many other smaller clusters in the Non bombing group. From the plot it looks like Imad had direct links to the bombing group.
nodes<-read.table("trainMeta.dat")
colnames(nodes)<-c("label","group")
nodes$id<-rownames(nodes)
nodes<-nodes[,c(3,1,2)]
nodes$title<-nodes$label
nodes$color<-ifelse(nodes$group==1,"red","blue")
nodes<-data.frame(nodes)
links<-read.table("trainData.dat")
colnames(links)<-c("from","to","value")
links<-data.frame(links)
weight_nodes<-graph.data.frame(d=links,vertices=nodes,directed = F)
degree_nodes<-degree(weight_nodes,mode="all")
nodes$value<-degree_nodes[match(nodes$id,names(degree_nodes))]
q1<-visNetwork(nodes,links)%>%visPhysics(solver="repulsion") %>%
visOptions(highlightNearest = list(enabled = T,degree=0,
hover = T),nodesIdSelection=TRUE,
selectedBy = "group")
q1
“Jamal Zougam” (born 1973 in Tangier) was one of six men implicated in the 2004 Madrid train bombings. He was detained on 13 March 2004, accused of multiple counts of murder, attempted murder, stealing a vehicle, belonging to a terrorist organisation and four counts of carrying out terrorist acts. Spain’s El País newspaper reported that three witnesses testified to seeing him leave a rucksack aboard one of the bombed trains.
“Jamal Zougam” is the person with most number of connections in the network so he is the one who could spread information fastest.
q2<-visNetwork(nodes,links)%>%visPhysics(solver="repulsion") %>%
visOptions(highlightNearest = list(enabled = T,degree=1,
hover = T),nodesIdSelection=TRUE,
selectedBy = "group")
q2
Yes I think the cluster I found centered by “Jamal Zougam” is the one most easily spotted in this plot also. This helps us identify clusters easily.
nodes1<-nodes
ceb<-cluster_edge_betweenness(weight_nodes)
nodes1$group<-ceb$membership
visNetwork(nodes1,links)%>%visIgraphLayout(layout = "layout_nicely")%>%
visOptions(highlightNearest = list(enabled = T,degree=1,
hover = T),nodesIdSelection=TRUE, selectedBy = "group")
Yes the main clusters identified using the reordered heatmap are the same spotted in the step 1 and 3. The two main clusters prominant in this plot are centered by “Jamal Zougam” and “Mohamed Chaoui”. They had most number of connections with the Bombers and the Non bombers.
netm<-get.adjacency(weight_nodes,sparse = F)
colnames(netm)<-nodes$label
rownames(netm)<-nodes$label
rowdist<-dist(netm)
row_order<-seriate(rowdist,"HC")
order1<-get_order(row_order)
netm_reord<-netm[order1,order1]
plot_ly(z=~netm_reord,x=~colnames(netm_reord),
y=~rownames(netm_reord),type="heatmap")%>%
layout(title = " Madrid Bombing Heatmap for finding clusters")
China and US were the largest producers of Oil and Coal throughout. China had a sudden rise in production of Coal from 2000 onwards. US was the highest producer of Oil throughout.
oil_data = read.csv("Oilcoal.csv", header = TRUE, sep = ";", dec = ",")
oil_data$X = NULL
head(oil_data)
## Country Year Coal Oil Marker.size
## 1 US 1965 291.8264 548.933 1
## 2 US 1966 306.0005 575.664 1
## 3 US 1967 300.2215 595.761 1
## 4 US 1968 310.7278 635.452 1
## 5 US 1969 312.0096 667.791 1
## 6 US 1970 309.0609 694.590 1
p1 <- oil_data %>%
plot_ly(
x = ~Oil,
y = ~Coal,
color = ~Country,
frame = ~Year,
type = 'scatter',
mode = 'markers',
text = ~Country,
hoverinfo = "text",
size = ~Marker.size
) %>%
layout(
xaxis = list(
title = "Oil",
zeroline = F
),
yaxis = list(
title = "Coal",
zeroline = F
)
) %>%
animation_opts(
redraw = FALSE
)
p1
I found that France and Germany had quiet similar motion patterns. Though germany was higher in Coal and Oil compared to France throughout but they were moving up and down together for the whole time span. There was a sudden fall in Oil in both the countries between 1980 - 1983. Both the countries had a large drop. The reason for the drop was the Oil Price collapse in that period. That is the reason all the countries were showing this sudden drop.
fil_d = oil_data[oil_data$Country == "France"| oil_data$Country == "Germany",]
p2 <- fil_d %>%
plot_ly(
x = ~Oil,
y = ~Coal,
color = ~Country,
frame = ~Year,
type = 'scatter',
mode = 'markers',
text = ~Country,
hoverinfo = "text",
size = ~Marker.size
) %>%
layout(
xaxis = list(
title = "Oil",
zeroline = F
),
yaxis = list(
title = "Coal",
zeroline = F
)
) %>%
animation_opts(
redraw = FALSE
)
p2
It is easier to look for values for a specific time in a barplot compared to the bubble plot. This is the main advantage bar plot has over bubble plot. Comparing between countries is also easier in a bar plot.
The disadvantage of a bar plot over bubble plot is that we can analyse just a dimension of data in this. In bubble plot we can see how the points move compared to two dimensions of data, which is not possible in bar plot.
oil_data$oil_p = (oil_data$Oil/(oil_data$Oil+oil_data$Coal))*100
temp = oil_data[,]
temp$oil_p = 0
new_oil_data = rbind(oil_data, temp)
p3 <- new_oil_data %>%
plot_ly(
x = ~oil_p,
y = ~Country,
split = ~Country,
frame = ~Year,
type = 'scatter',
mode = 'lines',
line = list(simplyfy = F, width = 20)
) %>%
layout(
xaxis = list(
title = "Oil_P",
zeroline = F
),
yaxis = list(
title = "Country",
zeroline = F
)
) %>%
animation_opts(
redraw = FALSE
)
p3
The elastic transition between values makes the movements more quick and precise as compared to the continuous movement. I found this to give a better understanding for the data movement. It had a pause in between the transitions which made it look better. These were advantages of using Elastic.
I did not find any disadvantages in this. I found it better than the continuous movement.
p4 <- new_oil_data %>%
plot_ly(
x = ~oil_p,
y = ~Country,
split = ~Country,
frame = ~Year,
type = 'scatter',
mode = 'lines',
line = list(simplyfy = F, width = 20)
) %>%
layout(
xaxis = list(
title = "Oil_P",
zeroline = F
),
yaxis = list(
title = "Country",
zeroline = F
)
) %>%
animation_opts(
redraw = FALSE,
easing = "elastic"
)
p4
I found that the projection near the value 6.8 on the scale best defined the clusters. I was able to spot two clusters using this projection.
Yes the clusters currospond to different year ranges. The first cluster has year ranges from 1965 to 1980, and the second cluster has the range 1980 to 2007.
mat <- read.csv2("Oilcoal.csv",sep=";")
mat<-mat[,1:3]
mat <- mat %>%spread(Country, Coal)
mat_scaled <- rescale(mat[,2:9])
rownames(mat_scaled) <- mat[,1]
set.seed(12345)
#tour <- new_tour(mat, grand_tour(), NULL)
tour<- new_tour(mat_scaled, guided_tour(cmass), NULL)
steps <- c(0, rep(1/15, 200))
Projs<-lapply(steps, function(step_size){
step <- tour(step_size)
if(is.null(step)) {
.GlobalEnv$tour<- new_tour(mat_scaled, guided_tour(cmass), NULL)
step <- tour(step_size)
}
step
}
)
## Value 0.786 18.3% better (0.781 away) - NEW BASIS
## Value 0.882 13.0% better (0.550 away) - NEW BASIS
## Value 0.894 1.4% better (0.223 away) - NEW BASIS
## Value 0.903 1.0% better (0.246 away) - NEW BASIS
## Value 0.951 5.4% better (0.446 away) - NEW BASIS
## Value 0.969 1.8% better (0.264 away) - NEW BASIS
## Value 0.979 1.1% better (0.147 away) - NEW BASIS
## Value 0.981 0.2% better (0.049 away) - NEW BASIS
## Value 0.981 0.1% better (0.044 away)
## Value 0.982 0.1% better (0.045 away) - NEW BASIS
## Value 0.982 0.1% better (0.062 away)
## Value 0.983 0.2% better (0.076 away) - NEW BASIS
## Value 0.983 0.1% better (0.069 away)
## Value 0.983 0.1% better (0.066 away)
## Value 0.983 0.1% better (0.042 away)
## Value 0.983 0.1% better (0.057 away)
## Value 0.983 0.1% better (0.055 away)
## Value 0.983 0.1% better (0.054 away)
## Value 0.984 0.2% better (0.086 away) - NEW BASIS
## Value 0.984 0.0% better (0.024 away)
## Value 0.984 0.1% better (0.064 away)
## Value 0.984 0.1% better (0.080 away)
## Value 0.984 0.0% better (0.017 away)
## Value 0.984 0.0% better (0.017 away)
## Value 0.985 0.1% better (0.111 away) - NEW BASIS
## Value 0.985 0.0% better (0.022 away)
## Value 0.986 0.1% better (0.067 away) - NEW BASIS
## Value 0.986 0.0% better (0.025 away)
## Value 0.986 0.0% better (0.024 away)
## Value 0.986 0.0% better (0.072 away)
## Value 0.986 0.0% better (0.011 away)
## Value 0.986 0.0% better (0.022 away)
## Value 0.986 0.0% better (0.008 away)
## Value 0.986 0.1% better (0.077 away)
## Value 0.986 0.0% better (0.019 away)
## Value 0.986 0.0% better (0.014 away)
## Value 0.986 0.0% better (0.028 away)
## Value 0.986 0.0% better (0.047 away)
## Value 0.986 0.0% better (0.026 away)
## Value 0.986 0.0% better (0.040 away)
## Value 0.986 0.0% better (0.042 away)
## Value 0.987 0.1% better (0.199 away)
## Value 0.987 0.1% better (0.111 away)
## Value 0.986 0.0% better (0.069 away)
## Value 0.986 0.1% better (0.073 away)
## Value 0.986 0.0% better (0.021 away)
## Value 0.986 0.1% better (0.083 away)
## Value 0.986 0.0% better (0.033 away)
## Value 0.986 0.0% better (0.016 away)
## Value 0.986 0.0% better (0.026 away)
## Value 0.986 0.0% better (0.016 away)
## No better bases found after 25 tries. Giving up.
## Final projection:
## 0.675 0.091
## -0.098 0.746
## -0.059 0.211
## -0.084 -0.222
## -0.173 -0.488
## -0.693 0.118
## 0.116 0.077
## 0.020 -0.290
## Value 0.841 26.6% better (0.781 away) - NEW BASIS
## Value 0.931 11.1% better (0.661 away) - NEW BASIS
## Value 0.964 3.5% better (0.444 away) - NEW BASIS
## Value 0.971 0.8% better (0.111 away) - NEW BASIS
## Value 0.975 0.4% better (0.130 away) - NEW BASIS
## Value 0.977 0.2% better (0.073 away) - NEW BASIS
## Value 0.982 0.6% better (0.143 away) - NEW BASIS
## Value 0.984 0.3% better (0.086 away) - NEW BASIS
## Value 0.986 0.2% better (0.064 away) - NEW BASIS
## Value 0.986 0.0% better (0.035 away)
## Value 0.987 0.1% better (0.085 away)
## Value 0.986 0.0% better (0.019 away)
## Value 0.986 0.0% better (0.019 away)
## Value 0.986 0.1% better (0.050 away)
## Value 0.986 0.1% better (0.035 away)
## Value 0.986 0.1% better (0.042 away)
## Value 0.986 0.0% better (0.034 away)
## Value 0.987 0.1% better (0.061 away)
## Value 0.986 0.0% better (0.019 away)
## Value 0.986 0.0% better (0.036 away)
## Value 0.986 0.1% better (0.048 away)
## Value 0.986 0.0% better (0.026 away)
## Value 0.986 0.1% better (0.047 away)
## Value 0.986 0.0% better (0.039 away)
## Value 0.988 0.2% better (0.140 away) - NEW BASIS
## Value 0.988 0.0% better (0.028 away)
## Value 0.988 0.1% better (0.056 away)
## Value 0.988 0.1% better (0.047 away)
## Value 0.987 0.0% better (0.016 away)
## Value 0.988 0.0% better (0.029 away)
## Value 0.987 0.0% better (0.015 away)
## Value 0.989 0.2% better (0.248 away) - NEW BASIS
## Value 0.990 0.1% better (0.038 away)
## Value 0.989 0.0% better (0.038 away)
## Value 0.989 0.0% better (0.034 away)
## Value 0.989 0.0% better (0.065 away)
## Value 0.989 0.0% better (0.030 away)
## Value 0.989 0.0% better (0.042 away)
## Value 0.989 0.0% better (0.034 away)
## Value 0.990 0.1% better (0.077 away) - NEW BASIS
## Value 0.990 0.0% better (0.034 away)
## Value 0.990 0.0% better (0.026 away)
## Value 0.990 0.0% better (0.018 away)
## Value 0.990 0.0% better (0.034 away)
## Value 0.990 0.0% better (0.034 away)
## Value 0.990 0.0% better (0.012 away)
## Value 0.990 0.0% better (0.063 away)
## Value 0.990 0.0% better (0.010 away)
## Value 0.990 0.0% better (0.009 away)
## Value 0.990 0.0% better (0.023 away)
## Value 0.990 0.0% better (0.030 away)
## Value 0.990 0.1% better (0.087 away)
## Value 0.990 0.0% better (0.014 away)
## Value 0.990 0.0% better (0.012 away)
## Value 0.990 0.0% better (0.020 away)
## Value 0.990 0.0% better (0.052 away)
## Value 0.990 0.0% better (0.013 away)
## Value 0.990 0.0% better (0.020 away)
## Value 0.990 0.0% better (0.025 away)
## Value 0.990 0.0% better (0.017 away)
## Value 0.990 0.0% better (0.025 away)
## Value 0.990 0.0% better (0.018 away)
## Value 0.990 0.0% better (0.023 away)
## Value 0.990 0.0% better (0.029 away)
## No better bases found after 25 tries. Giving up.
## Final projection:
## 0.720 0.068
## -0.248 0.526
## 0.157 0.613
## 0.145 -0.459
## 0.016 -0.182
## -0.182 -0.308
## -0.375 -0.048
## -0.448 0.044
## Value 0.829 24.8% better (0.781 away) - NEW BASIS
## Value 0.896 8.6% better (0.464 away) - NEW BASIS
## Value 0.941 5.0% better (0.374 away) - NEW BASIS
## Value 0.961 2.2% better (0.497 away) - NEW BASIS
## Value 0.966 0.4% better (0.085 away) - NEW BASIS
## Value 0.973 0.9% better (0.178 away) - NEW BASIS
## Value 0.978 0.6% better (0.123 away) - NEW BASIS
## Value 0.979 0.1% better (0.041 away) - NEW BASIS
## Value 0.980 0.3% better (0.122 away) - NEW BASIS
## Value 0.983 0.3% better (0.154 away) - NEW BASIS
## Value 0.986 0.3% better (0.100 away) - NEW BASIS
# projection of each observation
tour_dat <- function(i) {
step <- Projs[[i]]
proj <- center(mat_scaled %*% step$proj)
data.frame(x = proj[,1], y = proj[,2], state = rownames(mat_scaled))
}
# projection of each variable's axis
proj_dat <- function(i) {
step <- Projs[[i]]
data.frame(
x = step$proj[,1], y = step$proj[,2], variable = colnames(mat_scaled)
)
}
stepz <- cumsum(steps)
# tidy version of tour data
tour_dats <- lapply(1:length(steps), tour_dat)
tour_datz <- Map(function(x, y) cbind(x, step = y), tour_dats, stepz)
tour_dat <- dplyr::bind_rows(tour_datz)
# tidy version of tour projection data
proj_dats <- lapply(1:length(steps), proj_dat)
proj_datz <- Map(function(x, y) cbind(x, step = y), proj_dats, stepz)
proj_dat <- dplyr::bind_rows(proj_datz)
ax <- list(
title = "", showticklabels = FALSE,
zeroline = FALSE, showgrid = FALSE,
range = c(-1.1, 1.1)
)
# for nicely formatted slider labels
options(digits = 3)
tour_dat <- highlight_key(tour_dat, ~state, group = "A")
tour <- proj_dat %>%
plot_ly(x = ~x, y = ~y, frame = ~step, color = I("black")) %>%
add_segments(xend = 0, yend = 0, color = I("gray80")) %>%
add_text(text = ~variable) %>%
add_markers(data = tour_dat, text = ~state, ids = ~state, hoverinfo = "text") %>%
layout(xaxis = ax, yaxis = ax,title="Animated guided tour of Coal consumption per Country")
tour
knitr::opts_chunk$set(echo = TRUE)
library(visNetwork)
library(tidyverse)
library(igraph)
library(plotly)
library(seriation)
library(tourr)
nodes<-read.table("trainMeta.dat")
colnames(nodes)<-c("label","group")
nodes$id<-rownames(nodes)
nodes<-nodes[,c(3,1,2)]
nodes$title<-nodes$label
nodes$color<-ifelse(nodes$group==1,"red","blue")
nodes<-data.frame(nodes)
links<-read.table("trainData.dat")
colnames(links)<-c("from","to","value")
links<-data.frame(links)
weight_nodes<-graph.data.frame(d=links,vertices=nodes,directed = F)
degree_nodes<-degree(weight_nodes,mode="all")
nodes$value<-degree_nodes[match(nodes$id,names(degree_nodes))]
q1<-visNetwork(nodes,links)%>%visPhysics(solver="repulsion") %>%
visOptions(highlightNearest = list(enabled = T,degree=0,
hover = T),nodesIdSelection=TRUE,
selectedBy = "group")
q1
q2<-visNetwork(nodes,links)%>%visPhysics(solver="repulsion") %>%
visOptions(highlightNearest = list(enabled = T,degree=1,
hover = T),nodesIdSelection=TRUE,
selectedBy = "group")
q2
nodes1<-nodes
ceb<-cluster_edge_betweenness(weight_nodes)
nodes1$group<-ceb$membership
visNetwork(nodes1,links)%>%visIgraphLayout(layout = "layout_nicely")%>%
visOptions(highlightNearest = list(enabled = T,degree=1,
hover = T),nodesIdSelection=TRUE, selectedBy = "group")
netm<-get.adjacency(weight_nodes,sparse = F)
colnames(netm)<-nodes$label
rownames(netm)<-nodes$label
rowdist<-dist(netm)
row_order<-seriate(rowdist,"HC")
order1<-get_order(row_order)
netm_reord<-netm[order1,order1]
plot_ly(z=~netm_reord,x=~colnames(netm_reord),
y=~rownames(netm_reord),type="heatmap")%>%
layout(title = " Madrid Bombing Heatmap for finding clusters")
oil_data = read.csv("Oilcoal.csv", header = TRUE, sep = ";", dec = ",")
oil_data$X = NULL
head(oil_data)
p1 <- oil_data %>%
plot_ly(
x = ~Oil,
y = ~Coal,
color = ~Country,
frame = ~Year,
type = 'scatter',
mode = 'markers',
text = ~Country,
hoverinfo = "text",
size = ~Marker.size
) %>%
layout(
xaxis = list(
title = "Oil",
zeroline = F
),
yaxis = list(
title = "Coal",
zeroline = F
)
) %>%
animation_opts(
redraw = FALSE
)
p1
fil_d = oil_data[oil_data$Country == "France"| oil_data$Country == "Germany",]
p2 <- fil_d %>%
plot_ly(
x = ~Oil,
y = ~Coal,
color = ~Country,
frame = ~Year,
type = 'scatter',
mode = 'markers',
text = ~Country,
hoverinfo = "text",
size = ~Marker.size
) %>%
layout(
xaxis = list(
title = "Oil",
zeroline = F
),
yaxis = list(
title = "Coal",
zeroline = F
)
) %>%
animation_opts(
redraw = FALSE
)
p2
oil_data$oil_p = (oil_data$Oil/(oil_data$Oil+oil_data$Coal))*100
temp = oil_data[,]
temp$oil_p = 0
new_oil_data = rbind(oil_data, temp)
p3 <- new_oil_data %>%
plot_ly(
x = ~oil_p,
y = ~Country,
split = ~Country,
frame = ~Year,
type = 'scatter',
mode = 'lines',
line = list(simplyfy = F, width = 20)
) %>%
layout(
xaxis = list(
title = "Oil_P",
zeroline = F
),
yaxis = list(
title = "Country",
zeroline = F
)
) %>%
animation_opts(
redraw = FALSE
)
p3
p4 <- new_oil_data %>%
plot_ly(
x = ~oil_p,
y = ~Country,
split = ~Country,
frame = ~Year,
type = 'scatter',
mode = 'lines',
line = list(simplyfy = F, width = 20)
) %>%
layout(
xaxis = list(
title = "Oil_P",
zeroline = F
),
yaxis = list(
title = "Country",
zeroline = F
)
) %>%
animation_opts(
redraw = FALSE,
easing = "elastic"
)
p4
mat <- read.csv2("Oilcoal.csv",sep=";")
mat<-mat[,1:3]
mat <- mat %>%spread(Country, Coal)
mat_scaled <- rescale(mat[,2:9])
rownames(mat_scaled) <- mat[,1]
set.seed(12345)
#tour <- new_tour(mat, grand_tour(), NULL)
tour<- new_tour(mat_scaled, guided_tour(cmass), NULL)
steps <- c(0, rep(1/15, 200))
Projs<-lapply(steps, function(step_size){
step <- tour(step_size)
if(is.null(step)) {
.GlobalEnv$tour<- new_tour(mat_scaled, guided_tour(cmass), NULL)
step <- tour(step_size)
}
step
}
)
# projection of each observation
tour_dat <- function(i) {
step <- Projs[[i]]
proj <- center(mat_scaled %*% step$proj)
data.frame(x = proj[,1], y = proj[,2], state = rownames(mat_scaled))
}
# projection of each variable's axis
proj_dat <- function(i) {
step <- Projs[[i]]
data.frame(
x = step$proj[,1], y = step$proj[,2], variable = colnames(mat_scaled)
)
}
stepz <- cumsum(steps)
# tidy version of tour data
tour_dats <- lapply(1:length(steps), tour_dat)
tour_datz <- Map(function(x, y) cbind(x, step = y), tour_dats, stepz)
tour_dat <- dplyr::bind_rows(tour_datz)
# tidy version of tour projection data
proj_dats <- lapply(1:length(steps), proj_dat)
proj_datz <- Map(function(x, y) cbind(x, step = y), proj_dats, stepz)
proj_dat <- dplyr::bind_rows(proj_datz)
ax <- list(
title = "", showticklabels = FALSE,
zeroline = FALSE, showgrid = FALSE,
range = c(-1.1, 1.1)
)
# for nicely formatted slider labels
options(digits = 3)
tour_dat <- highlight_key(tour_dat, ~state, group = "A")
tour <- proj_dat %>%
plot_ly(x = ~x, y = ~y, frame = ~step, color = I("black")) %>%
add_segments(xend = 0, yend = 0, color = I("gray80")) %>%
add_text(text = ~variable) %>%
add_markers(data = tour_dat, text = ~state, ids = ~state, hoverinfo = "text") %>%
layout(xaxis = ax, yaxis = ax,title="Animated guided tour of Coal consumption per Country")
tour